home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Collection of Internet
/
Collection of Internet.iso
/
infosrvr
/
doc
/
www_serv.txt
< prev
next >
Wrap
Text File
|
1993-09-29
|
116KB
|
2,727 lines
W3 SERVER SOFTWARE
A W3 server, like the ftp daemon , is a program which responds to an
incoming tcp connection and provides a service to the caller. There
are many varieties of W3 server software to serve different forms of
data.
Basic W3 servers
CERN server The basic W3 daemon program serves files
already in hypertext or plain text. This
daemon then is used as a basis for many other
types of server and gateways .
NCSA server A server for files, written in C, public
domain. Runs on top of a gopher-style
database just like "gopherd".
Perl server from Marc VanHeyningen at Indiana
University. Wriiten in perl .
Plexus Tony Sander's engversion of Marc VH's.
MacHTTPD Server for the Macintosh
REXX for VM A server consisting of a amall C program
which passes control to a server written in
REXX.
Whatever server you are running, you will probably be interested in:
Tools for information providers
Syle Guide for Online Hypertext
Making a new server
This daemon is often used as a basis for a more specific server for a
given application. A server which allows a world of data to be seen
as part of the W3 universe is known as a gateway. (Most servers
could therefore be regarded as gateways, but the term implies some
conversion or mapping between dissimilar worlds) . For short
tutorials with examples, see:
Writing a server in C
Writing a server as a script
It is a good idea to pick the basic daemon or one of the servers in
the list as a starting point when making a new server.
Other servers and Gateways
T. Berners-Lee 1
WWW Server Guide) 14 July 1993
These are servers which provide data extracted from other systems.
they are built using code from the basic daemon, or scripts. See
List of Gateways available .
Tim BL
About documents generated from hypertext
Paper manuals generated from hypertext are made for convenience, for
example for reading when one has no computer to turn to. We have
tried to make the hypertext into fairly conventional paper documents,
but they may seem a little strange in some ways.
All the links have been removed. Therefore, it is worth looking at
the table of contents to see what there is in the manual. Something
which is not explained in place may be explained in detail elsewhere.
We have tried to keep related matter together, but sometimes
necessarily you might have to check the table of contents to find it.
Please remember that these are for the most part "living documents".
That is, they are constantly changing to reflect current knowledge.
If you see a statement such as "Product xxx does not support this
feature", remember that it was the case when the document was
generated, and may not be the same now. So if in doubt, check the
online version. Of course, the living document may be out of date
too, in which case it is helpful to mail its author.
Tim BL
WWW SERVER USER GUIDE
The basic WWW server allows files and directories in a file system to
be server to the world as menu trees, multimedia, and/or hypertext.
The http daemon, httpd , is a general server program which runs a w3
protocol, " HTTP ". This is a TCP/IP based protocol running by
convention on port 80.
In this guide
Distribution How to get the code.
Compilation The daemon is compiled in the same way as
the library and line mode browser -- see WWW
distributed code .
Installation How to install a server under unix internet
daemon
Options Command line options at run time
Rule File The format of a rule file. By default,
/etc/httpd.conf
Etiquette Conventions you should follow to make life
T. Berners-Lee 2
WWW Server Guide) 14 July 1993
smoother
Debugging If it doesn't seem to work
Known bugs and improvements desired
Change History change list of improvements made and bug
fixes.
Related documents
HTML specification A description of the hypertext markup
language used for representing menus, etc
HTTP specification A desription of the protocol used by the
server.
Status of basic WWW server
A basic fast information server for files.
Author TBL
Status: Version 2 available by anonymous FTP, with
no index search but file access, name mapping
and security filter, ability to act as
gateway for anything in the WWW library's
repertoire, including WAIS.
Plans: A version which will allow general unix
users to set up an index search daemon. As
index search tools are not generally
available, we may use the NeXT digital
Librarian or WAIS as an basis.
Platforms Unix, VMS, VM/CMS (VM/XA).
Next Milestone: Run shell scripts to implement virtual
documents and searches.
More information: User guide , Bug list , Internals , Change
history .
Wider scope: W3 servers , Other WWW software
Features include
Installation under inetd or run stand-alone
Can be run stand-alone by normal user
Automatically generates hypertext view of directory tree
T. Berners-Lee 3
WWW Server Guide) 14 July 1993
Uses "README" files to document directory listings
Handles multimple formats of same file, selects format
apropriate for client capabilities
Document name to filename mapping for longer-lived document
names
Can act as gateway for WAIS, news, etc if needed
WorldWideWeb distributed code
See the CERN copyright . This is the README file which you get when
you unwrap one of our tar files. These files contain information
about hypertext, hypertext systems, and the WorldWideWeb project. If
you have taken this with a .tar file, you will have only a subset of
the files.
THIS FILE IS A VERY ABRIDGED VERSION OF THE INFORMATION AVAILABLE ON
THE WEB. IF IN DOUBT, READ THE WEB DIRECTLY. If you have not got
ANY browser installed yet, do this by telnet to info.cern.ch (no
username or password).
ARCHIVE DIRECTORY STRUCTURE
Under /pub/www, besides this README file, you'll find bin, src and
doc directories. The main archives are as follows:
bin/xxx/bbbb Executable binaries of program bbbb for
system xxx. Check what's there before you
bother compiling. (Note HP700/8800 series is
"snake")
bin/next/WorldWideWeb_v.vv.tar.Z
The Hypertext Browser/editor for the NeXT --
binary.
src/WWWLibrary_v.vv.tar.Z
The W3 Library. All source, and Makefiles
for selected systems.
src/WWWLineMode_v.vv.tar.Z
The Line mode browser - all source, and
Makefiles for selected systems. Requires the
Library .
src/WWWDaemon_v.vv.tar.Z
The HTTP daemon, and WWW-WAIS gateway
programs. Source. Requires the Library.
src/WWWMailRobot_v.vv.tar.Z
The Mail Robot.
T. Berners-Lee 4
WWW Server Guide) 14 July 1993
doc/WWWBook.tar.Z A snapshot of our internal documentation -
we prefer you to access this on line -- see
warnings below.
BASIC WWW SOFTWARE INSTALLATION FROM SOURCE
This applies to the line mode client and the server. Below, $prod
means LineMode or Daemon depending on which you are building.
Generated Directory structure
The tar files are all designed to be unwrapped in the same (this)
directory. They create different parts of a common directory tree
under that directory. There may be some duplication. They also
generate a few files in this directory: README.*, Copyright.*, and
some installation instructions (.txt).
The directory structure is, for product $prod and machine $WWW_MACH
WWW/$prod/Implementation
Source files for a given product
WWW/$prod/Implementation/CommonMakefile
The machine-independent parts of the Makefile
for this product
WWW/$prod/$WWW_MACH/ Area for compiling for a given system
WWW/All/$WWW_MACH/Makefile.include
The machine-dependent parts of the makefile
for any product
WWW/All/Implementation/Makefile.product
A makefile which includes both parts above
and so can be used from any product, any
machine.
Compilation on already supported platforms
You must get the WWWLibrary tar file as well as the products you want
and unwrap them all from the same directory.
You must define the environmant variable WWW_MACH to be the
architecure of your machine (sun4, decstation, rs6000, sgi, snake,
etc)
In directory WWW, type BUILD.
Compilation on new platforms
If your machine is not on the list:
Make up a new subdirectory of that name under WWW/$prod and
WWW/All, copying the contents of a basically similar
architecture's directory.
Check the WWW/All/$WWW_MACH/Makefile.include for suitable
directory and flag definitions.
Check the file tcp.h for the system-specific include file
T. Berners-Lee 5
WWW Server Guide) 14 July 1993
coordinates, etc.
Send any changes you have to make back to
www-request@info.cern.ch for inclusion into future releases.
Once you have this set up, type BUILD.
NEXTSTEP BROWSER/EDITOR
The browser for the NeXT is those files contained in the application
directory WWW/Next/Implementation/WorldWideWeb.app and is compiled.
When you install the app, you may want to configure the default page,
WorldWideWeb.app/default.html. These must point to some useful
information! You should keep it up to date with pointers to info on
your site and elsewhere. If you use the CERN home page note there is
a link at the bottom to the master copy on our server. You should
set up the address of your local news server with
dwrite WorldWideWeb NewsHost news
replacing the last word with the actual address of your news host.
See Installation instructions .
LINE MODE BROWSER
Binaries of this for some systems are available in /pub/www/bin/ .
The binaries can be picked up, set executable, and run immediately.
If there is no binary, see "Installation from source" above.
(See Installation notes ). Do the same thing (in the same
directory) to the WWWLibrary_v.cc.tar.Z file to get the common
library.
You will have an ASCII printable manual in the file
WWW/LineMode/Defaults/line-mode-guide.txt which you can print out at
this stage. This is a frozen copy of some of the online
documentation.
Whe you install the browser, you may configure a default page. This
is /usr/local/lib/WWW/default.html for the line mode browser. This
must point to some useful information! You should keep it up to date
with pointers to info on your site and elsewhere. If you use the CERN
home page note there is a link at the bottom to the master copy on
our server.
Some basic documentation on the browser is delivered with the home
page in the directory WWW/LineMode/Defaults. A separate tar file of
that directory (WWWLineModeDefaults.tar.Z) is available if you just
want to update that.
The rest of the documentation is in hypertext, and so wil be readable
most easily with a browser. We suggest that after installing the
browser, you browse through the basic documentation so that you are
aware of the options and customisation possibilities for example.
SERVER
The server can be run very simply under the internet daemon, to
export a file directory tree as a browsable hypertext tree. Binaries
are avilable for some platofrms, otherwise follow instructions above
T. Berners-Lee 6
WWW Server Guide) 14 July 1993
for compiling and then go on to " Installing the basic W3 server ".
XMOSAIC
XMosaic is an X11/Motif W3 browser.
The sources and binaries are distributed separately from
FTP.NCSA.UIUC.EDU, in /Web/xmosaic. Binaries are available for some
platforms. If you have to build from source, check the README in the
distribution.
The binaries can be picked up, uncompressed, set "executable" and run
immediately.
VIOLA BROWSER FOR X11
Viola is an X11 application for reading global hypertext. If a
binary is available from your machine, in /pub/www/bin/.../viola*,
then take that and also the Viola "apps" tar file which contains the
scripts you will need.
To generate this from source, you will need both the W3 library and
the Viola source files. There is an Imakefile with the viola source
directory. You will need to generate the XPA and XPM libraries and
the W3 library befere you make viola itself.
DOCUMENTATION
In the /pub/www/doc directory are a number articles, preprints and
guides on the web.
See the online WWW bibliography for a list of these and other
articles, books, etc. and also the list of WWW Manuals available in
text and postscript form.
GENERAL
Your comments will of course be most appreciated, on code, or
information on the web which is out of date or misleading. If you
write your own hypertext and make it available by anonymous ftp or
using a server, tell us and we'll put some pointers to it in ours.
Thus spreads the web...
Tim Berners-Lee
WorldWideWeb project
CERN, 1211 Geneva 23, Switzerland
Tel: +41 22 767 3755; Fax: +41 22 767 7155; email: timbl@info.cern.ch
Installing the basic WWW server
Instructions for installing it under unix using the inet daemon are
here.
There are special instructions if you are installing under VMS .
The usual way to install a daemon is to either run it from the
bootstrap command file (for example /etc/rc) so that it runs
continuously, or to set up the internet daemon (inetd) to run it when
a call comes in.
T. Berners-Lee 7
WWW Server Guide) 14 July 1993
See a csh script which does everything below for unix BSD systems but
which you should modify with care for your own system.
Note: With version 2.0 on, a rule file is no longer essential if you
want to just export a directory tree.
The installation normally requires superuser status, but it is
poosible to run httpd from a terminal session as a normal user.
LOG FILE
If a log file is required, make sure that the user name under which
the daemon is run has the right to write the file
Tim BL
PRIVILIGED PORTS
The TCP/IP port numbers below 1024 are special in that normal users
are not allowed to run servers on them. This is a security
feaure, in that if you connect to a service on one of these ports
you are fairly sure that you have the real thing, and not a fake
which some hacker has put up for you.
The normal port number for W3 servers is port 80, which is such a
port. (This number is assigned by the Internet Assigned Numbers
Authority, IANA).
When you run a server as a test from a non-priviliged account, you
will normally test it on other ports, such as 2784 or 5000
typically.
Under unix
The inet daemon (running as root) can listen for incomming
conections on port 80 and pass them down to a process with a safer
uid for the server itself. Of course, you have to be root to set up
the inet daemon.
Under VMS
Under UCX, The process running as a server needs BYPASS privilege
to listen to ports below 1024. This might mean you have to install
the server. With other TCP/IP packages, privilege of some sort is
similarly required.
_________________________________________________________________
Tim BL
INSTALLING A DAEMON UNDER INETD
This is how to to set up the internet daemon (inetd) to run your
HTTPD server whenever a request comes in. (These steps are the same
for any daemon under unix: you will probably find a similar thing has
been done for the FTP daemon, ftpd, for example.)
T. Berners-Lee 8
WWW Server Guide) 14 July 1993
Step1
Copy the daemon program or shell script ( httpd in this example) into
a suitable directory such as /usr/etc. Protect it from anyone writing
to it except root.
Step2
Put "http" in the /etc/services file, or use the name of a specific
service of your own if you want to use have a special port number.
(Exceptions: on a NeXT, see using the NetInfomanager . On any
machine running NIS (yellow pages), see specicial instructions ).
For example,
http 80/tcp # WorldWideWeb server
Step3
Put a line in the internet daemon configuration file,
/etc/inetd.conf. For example,
http stream tcp nowait nobody /usr/etc/httpd httpd
/Public
(That was all one line.) Here "http" is used as a link between the
services file and inetd.conf: it could have been any identifier.
"nobody" is the user name under which you want the daemon to run,
which determines what privileges it has for example to read data.
"/usr/etc/httpd" is the actual file name of the server. The rest of
the line is the arguments passed to httpd: arg0 is the program name,
"httpd", by convention. Here the argument "/Public" is the
directory tree to be exported. This is in fact the default if no
directory is given. See command line syntax for more details.
Note: The inted.conf format varies from system to system. If in
doubt, copy the format of other lines in your existing inted.conf.
For example, under ultrix there is no user name field -- everything
runs as root.
Note: there seem to be, on the NeXT at least, a limit of 4 arguments
passed across by inetd!
Step 4
When you have updated inted.conf, find out which process is running
inetd, and send it a "HUP" signal. On BSD unix (For system V, use
ps-el for ps aux) this looks like:
> ps aux | grep inetd | grep -v grep
root 85 0.0 0.9 1.24M 304K ? S 0:01 /u
sr/etc/inetd
> kill -HUP 85
>
Test it
T. Berners-Lee 9
WWW Server Guide) 14 July 1993
Test the server with the line mode browser by giving its address
explicitly:
www http://myhost.dom.ain/welcome.html
This assumes that you have a file "welcome.html" in your exported
directory. If it doesn't work, you have probably missed something.
See notes on debugging .
Tim BL
USING NIS (YELLOW PAGES)
If your machine is running Sun's "Network Information Service",
originally know as 'yellow pages", read this.
You must:
First make an addition to the /etc/services file just as for a
normal unix system.
Then, change directory to /var/yp and type "make".
This will load the /etc/services file info the yellow pages
information system.
Some peopl ehave found that they needed to reboot he system afterward
for the change to take effect.
Tim BL
ADDING A SERVICE ON THE NEXT
The NeXT uses the the "netinfo" database instead of the /etc/services
file. This is managed with the /NextAdmin/NetInforManager
application. Here's how to add the service "www":
Start the NetInfomanager by double-clicking on its icon.
If you are operating in a cluster, open either your local
domain (/hostname) or if you have authority, the whole cluster
domain (/). If you're not in a cluster, just use the domain you
are presented with.
Select "services" from the browser tree.
Select "ftp" from the list of services
Select "dupliacte" from the edit menu.
Select "copy of ftp" and double-click on its icon to get
theproperty editor.
Click on "name" and then on the value "copy of ftp". Change
this to "www" by typing "www" in the window at the botton, and
hitting return.
T. Berners-Lee 10
WWW Server Guide) 14 July 1993
Click on "port", and then on the value "21". Change it to "80".
Use "Directory:Save" menu (Command/s) to save the result. You
will have to give a root password or netinfo manager password.
Tim BL
The Rule File
The rule file (configuration file) defines how the WWW software will
translate a request into a document name. For a server, it allows
one to provide an extra level of name mapping above that given by
links in the file system. It allows, for example, out of date names
to mapped onto their more recent counterparts.
For the client, it allows access to certain servers to be remapped
for example caching servers, or to local copies of the same
information.
The rule file also allows access to be restricted. This is
essential, to prevent, for example, unauthorized access to your
password file.
By default, the rule file /etc/httpd.conf is loaded, unless specified
otherwise with the -R or -r options .
See also: example rule files , Old format for software before 2.0,
Setting up gateways, Firewall gateways.
FORMAT
Each line consists of an operation code and one or two parameters,
referred to as the template and the result. Anything on a line after
and including a hash sign (#) is ignored, as are empty lines.
The server uses the top rule first, then EACH SUCCESSIVE RULE unless
told otherwise by PASS or FAIL. The operation codes are as follows
map template result If the address matches the template, use the
result string from now on for future rules.
pass template If the address maches the template, use it
as it is, porocessing no further rules.
pass template result If the string matches the template, use the
result string as it is, processing no futher
rules.
fail template If the address matches the template,
prohibit access, processing no futher rules.
The template string may contain at most one wildcard asterisk ("*").
The result string may have one wildcard only if the template has one.
When matching,
Rules are scanned from the top of the file to the bottom.
T. Berners-Lee 11
WWW Server Guide) 14 July 1993
If a request matches a "map" template exactly, the result string
is used instead of the original string and applied to successive
rules.
If the request maches a "map" template with wildcard, then the
text of the request which matches the wildcard is inserted in
place of the wildcard in the result string to form the
translated request. If the result string has no wildcard, it is
used as it is.
When a map substitution takes place, the rule scan continues
with the next rule using the new string in place of the request.
This is not the case if a pass ro fail is matched: they
terminate the rule scan.
SUFFIX DEFINITIONS
As well as any mapping lines in the rule file, the rule file may be
used to define the data types of files with particular suffixes. The
syntax
suffix <suffix> <representation> <encoding> [ <quali
ty> ]
for example:
suffix .pc text/plain 7bit 1.0
suffix *.* application/binary binary 0.1
suffix * text/plain 7bit
The parameters are as follows:
<suffix> The last part of the filename. There are two
special cases. "*.*" matches to all files
which have not been matched by any explicit
suffixes but do contain a dot. "*" by itself
matches to any file which does not match any
other suffix.
<representation> A MIME "content-type" style description of
the repreentation in fact in use in the file.
See the HTTP spec. This need not be a real
MIME type -- it will only be used if it
matches a type given by a client.
<encoding> A MIME content transfer encoding type. Much
more limited in variety than representations,
basically whether the file is ASCII (7bit or
8bit) or binary. A few other encodings are
allowed, and maybe extension to compression.
<quality> Optional. A floating point number between
0.0 and 1.0 which determines the relative
T. Berners-Lee 12
WWW Server Guide) 14 July 1993
merits of files xxx.* which differ in their
suffix only, when a link to xxx.multi is
being resolved. Defaults to 1.0.
PRESENTATION DEFINITIONS
In the rule file for a client, you can define the presentation of a
given data type. The syntax is
presentation <representation> <command-string>
where the parameters are
<representation> A MIME-style content type. You can use
regulare MIME types, such as image/jpeg, or
your own extensions which start with x-, such
as image/x-tiff, application/x-my-app. See
also above .
<command string> The command needed to display a temporary
file of this type. A "%s" within this string
will be replaces with the name of the
temporary file. Note that is any file suffix
has been specified as corresenponding to this
representation, then the temporarty file will
be give that (or the first if there is a
choice) suitable suffix.
Tim BL
RULE FILE EXAMPLES
A basic rule file for the http daemon might look like this (it looked
different before version 2.0 ):
pass / file:/u/john/welcome.html
pass /* file:/u/john/public/*
fail *
The first line maps the root document onto a specific document about
the server, and accepts it. (see etiquette about the welcome page)
The second line maps all document names onto filenames in a
particular directory and accepts them.
The third line disallows access to all other documents. (There won't
be in any in this case because of the mapping, but its wise to put in
for later).
Second example
map / /tnotes/welcome.html
map /tnotes/* file:/u/john/public/*
map /seminars/* file:/u/jane/seminars/*
T. Berners-Lee 13
WWW Server Guide) 14 July 1993
pass file:/u/john/public/*
pass file:/u/jane/seminars/*.html
fail *
The first line maps the root document onto a specific document about
the server. Because it is "map and not "pass", it DOESN'T accept
it but passes it on for futher mapping by lines futher down.
The second line maps all document names starting with /tnote/ onto
filenames in a particular directory where john maintains the
technical notes. If someone else takes over the technical notes, we
can change this. Here we are starting to distinguish between document
names and file names. This can be carried much further if necessary,
but one level of mapping is enough to allow for changes of
administration of different areas.
The third line separately maps the seminar information into Jane's
directory.
The fourth and fifth line enable access to anything in John's
"public" directory, and any .html file in Jane's "seminar" directory
tree. Note here that the * maps to any sequence INCLUDING SLASHES so
all files in any subdirectory of /u/jane/seminars will be enabled so
long as they end in .html.
The bottom line will pick up for example any attempt to use the
server to access non-html files in Jane's seminars directory.
Configuration file for a WAIS gateway
The httpd daemon can be used as a WAIS gateay if it has been compiled
with the necessary options and linked with the freeWAIS software. A
suitable configuration file is
map /* wais://*
pass wais://*
fail *
Server Command Line
The command line syntax for the basic www server allows a number of
options and an optional directory argument.
httpd [options] [directory]
The directory argument, if present, indicates the directory to be
exported. (Version 2.0 and later only.) If not present, either a
rule file is be used, to export combinations of directories, or else
the default is to export the "/Public" directory tree.
EXAMPLES
httpd -p 80 -dyt /ftp/pub
This exports the entire /ftp/pub tree with browsable directories and
README files included at the top of directory listings.
T. Berners-Lee 14
WWW Server Guide) 14 July 1993
httpd
This comamnd in the inetd configuration file inetd.conf exports the
/Public directory tree. This tree may contain soft links to other
directory trees.
-dn Disable directory browsing. An attempt to
access a directory will generate an error
response.
-dy Enable direcory browsing. Directories are
returned as hypertext documents. See browsing
directories . This is the default.
-ds Enable directory browsing only for
directories containing a file named
".www_browsable".
-dt For any browsable directory which contains a
README file, include the text of the README
file at the top of the document before the
listing. This is the default.
-db As -dt but put the README at the bottom,
after the listing. The -db and -dt options
may be combined with -dy as -dyb, -dty etc.
-dr Disables the README inclusion feature .
-l file Log all calls to the given file. The file is
appended to if it already exists.
-p port Specify the port number. If this option is
not given, the daemon assumes that it has
been run by inetd, and uses stdin and stdout
as its communication channel . Note that port
numbers under 1024 are privileged .
-v Verbose mode. Copious trace messages are
written to the standard output stream. Mainly
for debugging.
-r file Load a rule file . The rules are added after
any rules already loaded. Inhibits the
loading of the default rule file.
-R Do not use. Inhibit the loading of the
default rule file. Warning: running without
a rule file normally poses a security
problem. It won't work in general as only
the path part of a URL is input into the rule
T. Berners-Lee 15
WWW Server Guide) 14 July 1993
file, and a fully qualifiue URL (with file:
in front for example) is required on output.
Tim BL
Debugging the daemon
Suppose you think you have installed a W3 server but it doesn't work.
That is, you have followed the installation instructions and the test
at the end fails. Here we assume you have used port 80. If you have
a situation not handled by this problem-solving guide, please mail
me.
Type
www http://myhost.domain:80/
What happens?
"Cannot connect to information server" message, "Unable to
access document" or some other generic-sounding error message
An empty document is displayed
A document containing the words "Document address invalid or
access not authorised", or some "Error 500" message is displayed
A document is displayed, but not what you wanted the server to
give in response to that document name (/)
Tim BL
DOCUMENT ADDRESS INVALID
You have accessed a W3 server and you get back a message "Document
address invalid or access not authorized", or some other error
message from the server.
The 1.x server does not (originally for security reasons)
distringuish between a document which does not exist, and one to
which you are not allowed access. However, most server are public
servers which allow access to anyone, so if you are following a bona
fide link, this could mean
You have been passed a bad document address. If you are
following a link, check with the author of the document which
contained the link.
The document has been moved. Check with the server
administrator. You should be able to find out who runs the
server by going to the welcome page (type "g /" with the line
mode browser) and seeing a link to information about the
maintainers.
T. Berners-Lee 16
WWW Server Guide) 14 July 1993
If you are the server administrator, and you can't understand why
the daemon refuses to deliver the file,
Check the rule file if you have one. Think out way the document
name will be mapped successively by each line, and what the
result will be. Checking the trace below may help clarify this.
Run the daemon with trace from a terminal session to get trace
information
Tim BL
CAN'T CONNECT TO SERVER
There is more information you can get. use the "verbose" option on
the browser to find out what went wrong:
www -v http://myhost.domain:80/
What do you get? A load of trace messages. There are several cases.
The browser can't look up the name of the host. If it can, it
will display "Parsed address as" message. If not, try fixing
your name server or /etc/hosts file, or quoting the IP number of
the host in decimal notation (like 128.141.77.45) instead.
The browser can get to the host but gets "Connection refused"
status back .
Your browser gets an error number but prints "error message not
translated". This is because when it was compiled on your
platform it didn't know what form the error message table took.
Try the same thing form a unix platform for example.
You get some network error like "network unreachable". Depending
on whether the IP network is your responsibility or not, and
your attitude to life, either fix it, try again in an hour's
time, or complain to someone.
_________________________________________________________________
Tim BL
"CONNECTION REFUSED"
The browser tries to connect to the daemon but gets this status in the
trace.
This means that noone was listening on that port number. Check the por
t numbers match btween server and client. Make sure you specify the p
ort number explicitly in the document address for www.
If you are running the daemon without the inet daemon, (with the -a op
T. Berners-Lee 17
WWW Server Guide) 14 July 1993
tion) then try running it from the terminal with -v as well. The trac
e for the server should say "socket, bind and listen all ok". If it do
es, and you still get "connection refused", then you must be talking t
o the wrong host (or, conceivably, different ethernet adapters on the
same host)
If you are running with the inet daemon, then check both the services
file (/etc/service) or database (yellow pages, netinfo) if your system
uses it, and the /etc/inetd.conf file. Check the service name matche
s between these two.
Did you remember to kill -HUP the inet daemon when you changed the int
ed.conf file?
Try running the deamon from a shell window to see what happens better.
Tim BL
YOU GET AN EMPTY DOCUMENT
The document sent back is empty, but there is no error message.
The inet daemon has started a process to run your server but it
immediately failed. Possibilities include:
The daemon may not be in the file specified, or may not be
executable by the specified user (or, if a user id is not
specified in your variety of inetd.conf, root)
You have written your own daemon and it crashes.
You are using ours and it crashes (mail us!)
Try running the daemon from a terminal window to see what happens.
Tim BL
BAD OUTPUT FROM THE DAEMON
These are some ideas:
Try running the server from the terminal .
Check the HTML source the daemon produces with
www -source http://myost.domain:80/
Try telnetting to the daemon and simulating the client:
> telnet myhost.domain 80
Connected to myhost.domain on port 80
Escape is ^[
GET /documentname
T. Berners-Lee 18
WWW Server Guide) 14 July 1993
Tim BL
TELNETTING TO A SERVER
Most implementations of telnet allow you to specify a port number. Und
er unix this is often just a second parameter, under VMS a /PORT optio
n.
The HTTP protocol is a telnet protocol, so you can simulate it just by
typing things in. This will help you to see exactly what a sending b
ack, and it will check you that it really is the server not the browse
r which has a problem.
Here is an example. (You type "telnet..." and "GET ...").
> telnet myhost.domain 80
Connected to myhost.domain on port 80
Escape is ^[
GET /documentname
<PLAINTEXT>
Document name "/documentname" invalid.
RUNNING UNDER SHELL
You don't have to run the daemon under the inted if it doesn't work.
You can run it from a shell session.
If the daemon is httpd, then run it from your terminal, with a
different port number like 8000. You use the -p option .
httpd -p 8000
Note: You must be root (under VMS, have some privilege) to run with a
port number below 1024.
If you select a port above 1024, then you can run as a normal user.
This way, anyone can publish files on the net. Howeever, it isn't
very reliable, as your server will not automatically come back up if
the machine is rebooted. In the long term it is best to install it
under "inetd".
You can't use a port number which has been used by a daemon process
recently, so you may have to switch port number if you ^C and restart
the daemon. When it is running like this, you can read the trace
messages and use a debugger on it if necessary. (See also: telnetting
to the server )
Debugging using Trace
If you can't understand why a server refuses to give back a document,
then run wiith the -v option to get trace. You will see the daemon
setting up the rules for translating requests into local URLs, and
you will see its attept to access the file (assuming you map requests
onto files).
httpd -v -p 8000
Try to access the document from a client using another terminal
T. Berners-Lee 19
WWW Server Guide) 14 July 1993
window. Look at the trace printout. It will probably explain what is
happening. If it includes specific messages below, follow them to
detailed help.
Can't find internet hostname `'
If you still can't figure out the problem, mail your local guru help
desk or if desperate www-request@info.cern.ch ENCLOSING a copy of
that trace.
Even simpler
For testing a daemon very simply, without using a client, you can
make the terminal be the client. With httpd, or if the server is a
shell script "myserver", try just running it with the terminal and
typing GET /documentname into its input:
> httpd
GET /
Try it with the -v option if what comes back isn't a formatted
document.
Tim BL
The basic W3 server: Internals
This describes the generic hypertext daemon (server) program. The
daemon is part of the WWW project. See also:
User guide .
Bugs and Features
Other servers
The hypertext daemon, like the ftp daemon, is a program which
responds to an incomming tcp connection and provides a service to the
caller.
SOURCES
A compilation option (SELECT) controls whether more than one
connection can be handled at a time. This is a function of whether
the TCP/IP implementation beneath the application has a working
"select()" routine. If it is not true, this implementation services
one connection, then drops it before accepting another one. In
neither case does the daemon concurrently serve two clients, nor does
it fork off a process to do that.
The basic server loop is in the file HTDaemon.c . A separate module
( for example HTRetrieve.c ) contains the code to handle one request.
Various specific versions of this may be written for different
flavours of server. Also used are various modules of WWW common code.
The httpd released from CERN uses almost the entire W3 library and
T. Berners-Lee 20
WWW Server Guide) 14 July 1993
can therefore access any object which a browser running on that
machine can access, and return it as HTML or some other format.
Tim BL
Bugs and Improvements needed
Improvements to be made in the HTTP daemon program are as follows.
(Se also Features )
Call shell scripts to perform searches on directory trees or
documents.
The HTRetrieve() routine ought to be able to pick up the user
node and userid, etc...
Ought to have chroot option. (wwwww July 93)
Tim BL
Daemon features: Update history
History list for the WWW daemon . (See also bugs ). Many other
changes to the daemon are in fact changes to the common code library.
2.06 7 JUNE 93
Bug fix: Load error 500 returned as proper HTTP status, not as
simple document.
WAIS gateway now caches source files again.
Bug fix: Daemon used to try to display graphics file locally on
the server when the client couldn't display them! Cause of much
confusion :-)
2.05
Big bug fix in local file directory handling .. didn't work in
2.04!
2.04 28 APRIL 93
With the properly compiled libwww library, this daemon will
operate as a WAIS, news etc gaetway if so configured.
WAIS gateway operation bug fix.
2.03-BETA: UNRELEASED
Bug fix: operation with no rule file didn't work as expected.
T. Berners-Lee 21
WWW Server Guide) 14 July 1993
2.02-BETA: 17 MARCH 93
Misleading error trace removed.
Compiled on HP, SGI, Sun, DEC, NeXT and binaries available
Binary handling fixed in library.
Reference to missing HTDirRead.h removed.
Assumes that user can handle files of unknown format
(application/binary).
2.00-ALPHA 15 MAR 93
Simple command line -- with no parameters, exports the /Public
directory.
Multiformat handling -- see library changes for 2.0. Links to
.multi filenames resolve to any file with same root, any
recognised extension.
UNREALEASED 0.9B
Bug fix: If a PASS or FAIL line in the configuration file acted
on a single document id (ie no wildcard) then it crashed the
daemon. (HTRules.c, 17-Jun-92, TBL).
SEPT 1991 V0.3
Bug fix: Plain text files were returned to be parsed as SGML,
causing them to come out as garbage. (Mike Sendall)
AUGUST 1991 V 0.2
-R option now suppresses default rule file.
Rule file format changed completely. Now allows authorisation of
specific paths only.
JUNE 1991 VERSION 0.1
-r and -R options for rules
Default address is now for Inet daemon working. (29 June)
-l option to log to a file.
-a option for address other than default
_________________________________________________________________
Tim BL
T. Berners-Lee 22
WWW Server Guide) 14 July 1993
A SHELL SERVER FOR HTTP
The HTTP protocol is very simple. The following is an example of a
server program written in sh:
#! /bin/sh
read get docid
echo "<TITLE>$docid</TITLE>"
echo Here is the data
The docid may have a trailing carriage return to be stripped off on
some systems. You can modify that script to produce the data you
actually want. The HTML syntax for marked-up text is fairly simple,
but if you want just to send plain text, then just send the
.PLAINTEXT.tag first:
#! /bin/sh
read get docid
sed -f txt2html.sed $docid
or in csh
#! /bin/csh
request = ( `echo $<`)
if ($#request <2) exit
sed -f txt2html.sed $request[2]
When you have written your script, set the execute bit and then
configure the inet daemon to run it . A few more examples:
A sh script to generate a menu for files in a directory
An awk script to generate menu from a list of files .
A perl script for all kinds of stuff on the ASIS server
The shell script of the Hytelnet gateway
If you know the perl language, then that is a powerful (if otherwise
incomprehensible) language with which to hack together a server.
See also a case study of mapping a database onto the web .
All contributions to these examples welcome!
Tim BL
Making a server
Here is a run-through of what is needed to make a www server , with
examples from a suggested server for the HEPDATA base of Mike
Whalley . See also etiquette .
T. Berners-Lee 23
WWW Server Guide) 14 July 1993
Basically, to make the data available, you make a server which is a
modified version of your program. When a user follows a link to
HEPDATA (or runs a command to jump straight there), the client
program opens a connection to a server program on a VM machine
(say, but could be VMS or unix). The server in turn runs your
program.
Let me just describe the essence of the changes needed so that you
can get an idea of how much effort would be involved.
The first thing you do is to make up an arbitrary naming method for
anything which HEPDATA can display. In this I include the welcome
page, any menu, any article, any help text. Typically one invents
a hierarchical naming scheme, like
/HEPDATA The first "welcome" menu
/HEPDATA/HELP The top-level help
/HEPDATA/HELP/REAC The help on the reaction datab
ase.
/HEPDATA/REAC The reaction database itself
/HEPDATA/REAC?P+PBAR list of reactions involving p
and pbar (?)
/HEPDATA/DATA/RD125V687 Some article (say).
You do this because, whereas an interactive user follows a path
through the program, the W3 user calls the program once for each
thing. There is no "state" information. This allows one to make a
hypertext link to any part of the scheme and jump back in again
later. For example, one might want to quote an article, or the
reaction database, or a particular list of reactions.
Now all you do is modify the program so that, given a name above,
it will
return the required document. This means basically turning it from
a sequence the user goes through into a set of conditionals to
isolate each of the individual cases above. Apart from that, the
data retrieval code is unchanged apart from the output formatting.
Many of the options in fact mean mapping the name onto a fixed
file's name its the searches which have to activate real code.
The hypertext trick you need to use in the menus. Where an option
is normally output to the screen, you have to tell the client what
to ask for is the user selects that option. For example, in the
main menu /HEPDATA you have an option which gives the help. You
would represnt this "anchor" as
T. Berners-Lee 24
WWW Server Guide) 14 July 1993
<A NAME=4 HREF=/HEPDATA/HELP> Help </A>
"Help" is all that is displayed, with some indication that it is an
option. If the user choses (clicks a mouse on, choses by number
depending on which client he has) then the client asks the server
for /HEPDATA/HELP. ("A" is for "anchor", "HREF" is for "hypertext
reference")
For the index searches, it's as simple. When the server sends the
text called /HEPDATA/REAC it also sends a special tag . This tells
the client to enable a FIND command, or find panel etc (depending
on the client). You don't have to do any human interface work. The
client automatically comes back with a search coded up in the form
/HEPDATA/REAC?P+PBAR etc. Your server in turn returns a menu (say)
with pointers to the data which has been found.
You can also put some formatting tags (like headings) which will
make the data look really nice on a window system.
_________________________________________________________________
Tim BL
W3 AND HTMLTOOLS
These tools aid managements of W3 servers, generation of hypertext,
etc.
W3 basic daemon Part of the W3 project code.
Index search server which is a slight modification to basic CERN
daemon, with a couple of scripts and WAIS
programs. Implements searches on entire
directory trees of WWW documents using WAIS
inverted indexing.
Gateway servers which you can take and adapt.
Framemaker interface There are some tar files on the anonymous
FTP archive on file://info.cern.ch/www/src
which allow FRAMEmaker to be used as a W3
tool. Dan Conolly, Convex. Incldues MIF HTML
translation.
Making HTML into TeX We did this with the "WWW Book" to print it.
See the Makefile for example, and the scripts
html2latex.sed and sub1.sed . We wrote a
special introduction, but otherwise all the
text was hypertext from the W3 project.
Generating HTML These are scripts for generating SGML
T. Berners-Lee 25
WWW Server Guide) 14 July 1993
hypertext from things like directory
listings, etc. Also, for checking and
correcting dubious HTML.
WP5.1 to HTML WordPerfect 5.1 to HTML conversion
LaTex to HTML Code from Nikos Drakos, Computer Based
Learning Unit, University of Leeds.
Server log analysis Analysing server logs requires first of all
changing the numeric internet node numbers
into domain names. httpd-analyse.c is a
program to do that. Feed the results through
awk and grep of your choice!
Server log analysis Getsites .c is a program which generates
reports on a weekly or monthly basis.
Web-roaming robot etc
Guido van Rossum's knobot code in "Python"
language.
Telnet server Setting up a service machine for anonymous
users to log in to a www client.
Mail Robot A program to return any information in the
web information by electronic mail
Tim BL
HTMLGeneration
Here are some example files you can use for generating HTML from
lists of files and other things.
RTF to HTML Convert RTF (using specific styles) into
HTML.
fix-html.pl written by Dan Connolly, is a perl script to
legitimize old HTML files into SGML-abiding
HTML (as per the DTD that Dan created).
text2html.sed A sed script to turn plain text into
plain-looking valid HTML markup so that it
will be rendered just as it was.
ls2html.awk is an awk script which will just take a list
of names and generate a menu.
dir2html is a shell script which generates a menu of
pointers to files with particular suffixes in
a set of directories. It also includes a
T. Berners-Lee 26
WWW Server Guide) 14 July 1993
README file at the head of the hypertext list
if one exists.
htn2html.c See the Hytelnet gateway for the program to
convert hytelnet data into HTML.
findrefs.pl Written by Ari Lemmke, finds references
http:... in plain text files and generates
anchors out of them.
You can make any variations on these you like of course. [CERN does
not accept any responsability for things quoted in these lists].
Updating the Newsgroup lists
To update some of the news pages automatically you must be logged on
to the news server or have the news directories mounted.
Carl mentioned that you must be a member of the UNIX group news
(otherwise you won't have permission to read the news directories)
but that doesn't seem to be necessary for these functions.
UPDATEGROUPS
This script updates the list of newsgroups. For the overview list ,
it saves everything before the "Others" heading, and adds on a list
of pointers to newsgroup stems not already mentioned in the saved
hypertext.
For each stem, it saves any command before the glossary list of
groups, and then regenerates that list of groups.
NEWSPAGE_UPDATE (OLD)
The script NewsPage_Update creates complete lists of active groups
for the following groups: alt, bionet, bit, biz, cern, ch, comp,
eunet, gnu, news, rec, sci, soc, talk, vmsnet. It does this by
writing the header in explicitly for each group, and then generating
a list of of subgroups using FindGroups
For comp and news, a full list is placed in fullcomp.html and
fullnews.html. The files comp.html and news.html are formatted by
hand already, and so are not touched by the script.
NewsPage_Update works by writing some HTML text into a file for each
group to be updated, called [newsgroup_name].html.new, then calling
the script FindNewsGroups. This checks the file
/usr/local/lib/news/newsgroups for the groups within the current
group which are active. Finally the new file is renamed to remove
the .new.
The list of stems to search, and their titles and any other comment
is hardcoded into the NewsPage_Update script, and the list is
DUPLICATED in Others_Update.
OTHERS_UPDATE
The Others_Update script finds stems which are not included in the
Overview.html file, but which are active. This list of which groups
not to include is hardcoded into the script. For each group, it
T. Berners-Lee 27
WWW Server Guide) 14 July 1993
calls GrpCreate. This adds the name to OtherGroups/Overview. It
then runs FindNewsGroups for each group.
NOTE
Once the script has completed all the .new groups must be renamed
manually to remove the .new extension.
GRPCREATE
This reads a newsgroup stem name from stdin.
It then creates the top of a file for the list of groups with that
stem. This will be called ${nn}.html.new. where ${nn} is the stem
name. Unfortunately there is no way to get a description of the stem
to include in this file. However, if the .html file already exists,
it will use everything up to an excluding the first DL tag from the
.html file for the .html.new file. Therefore, everything above the DL
tag may be hand edited.
GrpCreate adds a pointer from OtherGroups/Overview.html.new to the
.html file.
The .html file is renamed .html.old, and teh .html.new becomes .html,
with diffs being stored in a .diffs file under the date.
.\" Macros for HTML .\" Jim Davis 6 Nov 92 .ps 12 .in 5 .de B ..
.de R .. .de H1 .ti -5 .ps 18 \fB\\$1\fR .ps 12 .br .. .de H
2 .ti -3 .ps 14 \fB\\$1\fR .ps 12 .br .. .de H3 \\$1 .br ..
.de H4 \\$1 .. .de H5 \\$1 .. .de H6 \\$1 .. .de H7 \\$1 .
. .de H8 \\$1 .. .de H9 \\$1 .. .de DL .in +5 .. .de DE .in
-5 .. .de DT .ti -3 * \\$1 .. .de DD .br ..
Date: Wed, 4 Nov 1992 16:48:34 -0500
From: Jim Davis <davis@dri.cornell.edu>
To: wei@xcf.berkeley.edu, www-talk@nxoc01.cern.ch
Subject: improved printing of WWW files
If you can't quite manage to live without hardcopy, you may wish somet
imes to print WWW files. I have written a couple of scripts to do thi
s. They are particularly useful with Pei Wei's excellent Viola WWW br
owser.
A tar archive is available for anonymous FTP:
dri.cornell.edu/pub/davis/print-www.tar
It contains:
README
print-www
print-www.l
html-to-latex
html2latex.sed (modified version of original CERN version)
T. Berners-Lee 28
WWW Server Guide) 14 July 1993
The hardest part was writing the perl script to obtain documents via h
ttp protocol - turns out you cant just run pipes through telnet.
The conversion from HTML to LaTex is not really robust yet - this is
doubly hard since there is no guarentee that the HTML is legal. But
at least it works for my test cases. No doubt it will be improved in
time.
best wishes
GATEWAY SOFTWARE
See also: W3 server software , W3 client software
These are servers which provide data extracted from other systems.
they are built using code from the basic daemon, or scripts.
FIND gateway for CERN/VM XFIND which calls a REXX exec to
get the information from the XFIND system
running on the CERNVM mainframe.
Hytelnet gateway A gateway to Peter Scott's list of telnet
sites
VMS Help gateway This allows any VMS help files to be made
available to WWW clients. Runs on VAX/VMS.
WAISGate A gateway to information available using the
W.A.I.S. protocol.
DCLServer A server for VMS systems which allows you to
write a gateway to your own favorite
information system using DCL.
System33 A (big) csh script server providing data
including Xerox System33 documents, man pages
in plain text, phone numbers, etc. etc...!
Oracle A generic server to oracle. Could be used as
a basis for gateways to specific Oracle
databases.
Geography Gateway to the Geography server at U
Michigan
TechInfo TechInfo is the CWIS from MIT. A gateway
exists thanks to Linda Murphy/Upenn.
Tim BL
Geography gateway
Wed, 18 Nov 1992
T. Berners-Lee 29
WWW Server Guide) 14 July 1993
Jim Davis Here is a quickly hacked up Gateway from WWW to the Univers
ity of Michigan Geography server. It expects one argument, a WWW doc
id. It ignores the "pathname", extracts the search words, then passe
s those to the server. It does NOT parse the data returned by the ser
ver (that is an improvment yet to be done) but you can understand the
output.
To use this, you would need to have an HTTP server running someplace w
here you can attach this gateway. I can provide the very simple HTTP
server I use here, but this subject is already documented in the WWW o
nline documentation.
Source code in perl
The WWW TechInfo gateway
This is a gateway built using the basic server code, plus one source
file in C. Thanks to Linda Murphy of Univerity of Pennsylvania for
the etchinfo code.
The gateway data as running at CERN
The source file
Tim BL
The W.A.I.S. - WWW gateway
This is an example of a WWW server and a WAIS client. It is just the
regular httpd daeomon linked with:
a version of the libwww library which was compiled with the
DIRECT_WAIS option, and includes the HTWAIS module;
the freeWAIS libraries from CNIDR.
See a summary of some data available through the gateway .
WSRC FILES
The gateway keeps a cache of WAIS "source" files. These are files
describing WAIS servers. They are normally picked up automatically by
searching a "directory of servers" index. Once the gateway has picked
up a desciption of a server, it uses the description to describe
the server to those who follow links to it. (See the HTWSRC module of
libwww)
These source files are parsed, and are kept in the directory
/usr/local/lib/WAIS under the server name, port, and database name.
Tim BL
VMS Help server
This server can provide WWW users with any information stored in VMS
T. Berners-Lee 30
WWW Server Guide) 14 July 1993
Help format.
Additional information available: :->
Try me ! An example server running at CERN
Status The current state, pointers to more
information
JFG
GATEWAY TO VMS HELP: INTERNALS
These are technical and installation notes about the gateway to VMS
Help . Please send bug reports and suggestions to Jean-Francois
Groff (jfg@cernvax.cern.ch).
Sources
The program consists of the generic daemon HTDaemon.c , and a
special function, stored in VMSHelpGate.c , to retrieve VMS Help
data and convert it to HTML.
Installation
The files you need are as follows. You should customise them,
putting in your own directory names.:
launchgate.com Runs the server as a detached process. Put a
call to this from your sys$startup procedure,
wherever that is. This detaches a job to use
www_server.com ans input, and a log file as
output.
www_server.com The server command file, a wrapper for the
actual server executable. In this file, set
the temporary directory for the storage of a
cache of .HLP files. This file runs the
executable.
test.com Here is just an example of a file to build
and test the server.
descrip.mms This is an MMS file to build the executable.
If you don't have MMS, you may be able to
figure out from loking at it which commands
you should use. You can find a machine
running MMS and generate the equivalent .com
files. See comments at the top of this file
on how to run it.
The source files and executable .EXE are currently (October 92)
T. Berners-Lee 31
WWW Server Guide) 14 July 1993
available on HEP decnet in vxcrna::disk$d1:[jfg.www...]. Note
also you can pick up the master sources from dxcern:: automatically
by running
MMS /MACRO=(U=DXCERN::).
If you are not in HEP decnet, you should find the sources in the
WWWDaemon_v.vv.tar.Z file in the distribution. See the README file.
_________________________________________________________________
JFG
VMS HELP SERVER BUGS
This is a list of known bugs and desired improvements. Don't let it sh
rink too fast : send your bug reports and suggestions to Jean-Francois
Groff (jfg@cernvax.cern.ch).
The keyword search works fine on any number of levels down, but
then the generic daemon doesn't know how deep the server went,
so anchor names lack the intermediate levels. Solution :
generate anchor names relative to the input path (before '?').
DANGER : Attempts to access VMS topics with a weird name like
":=" will crash the server because VMS will try to create a .HLP
file with an invalid file specification due to these special
characters. Solution : Make a good escaping system (that works
with VMS and Un*x styles as well). Crude and bulletproof
solution : Ignore any offending topic name !
Reference to another help library through @ will only search
SYS$HELP for the corresponding .HLB file.
We need an overview page that lists all help libraries
available.
__________________________________________________________ JFG
VMS HELP SERVER FEATURES
This lists the main features of the VMS Help gateway, with
improvements in reverse chronological order. Help make it grow fast
: send your bug reports and suggestions to Jean-Francois Groff
(jfg@cernvax.cern.ch).
Experimental gateway 0.4 -- 2 Oct 91
Accepts user queries by number or by name. In the latter case,
can go down several levels, for instance, from the main help
page : "cc /lib" will go to topic CC, subtopic /LIBRARY.
T. Berners-Lee 32
WWW Server Guide) 14 July 1993
On invocation with only //node:port/HELP, displays the contents
of the standard VMS Help library SYS$HELP:HELPLIB.HLB (function
lis_to_html).
Address format : //node:port/HELP/[@library/][topic[/subtopic]*]
__________________________________________________________
JFG
STYLE GUIDE
This guide is designed to help you create a hypertext database
effectively communicates your knowledge to the reader. It has been
prepared in the light of comments by readers, and many demands by
providers of online documentation. Some of the points made may be
influenced by personal preference, and some may be common sense, but
a collection of points has been demanded, and so here it is.
The guide is designed to be read sequentially, but feel free to
depart from this. The sections are as follows:
Introduction
Overall structure of your work
Within each document
Test your document
Background reading
Reader comments
This document is open to comment
Suggestions are strongly invited, if you think of anything mail it to
timbl@info.cern.ch, mentioning the Style Guide for Online Hypertext
or its URL.
Tim BL
Introduction
You are going to write (or generate ) some online hypertext. Because
hypertext is potentially unconstrained you are a little daunted. Do
not be. You can write a document as simplly as you like. In many
ways, the simpler the better.
You will be writing a number of separate files. These files will be
linked to each other, and to external documents, to make your final
work.
You may think of your work as a "document", and if it were on paper,
then you would call it that. In the online case though, we tend to
refer to each individual file as a document. A document may
T. Berners-Lee 33
WWW Server Guide) 14 July 1993
correspond, in the book analogy, to a section or a subsection, or
even a footnote. In this guide, we'll refer to the whole collection
as a work.
The document is the unit by which information is picked up. At any
one time, a document is completely loaded into the reader's computer.
It is also normally the amount you edit at any one time, though with
a good editor you will probably have a number of documents open at a
time.
The section on structure discusses how you organize your material
into documents. Another section discusses how to organise your
material within a document .
(Up to overview , on to structure )
Tim BL
Structure
If you have in mind a body of information to put across to your
reader, you probably have a mental organisation for it. Normally
this is a sort of hierarchical tree, like the chapters of a book if
you were to write a book.
Keep this structure. It helps readers to have a tree structure as a
basis for the book: it gives them a feeling of knowing where they
are. You can also us this structure for oganising your files in
directories.
You should also bear in mind:
The reader's preconceived structure
The idea of overlapping trees
How big to make each document
(Up to overview , back to Introduction, on to: writing each document)
Tim BL
THE READER'S STRUCTURE .
Remember always the audience for whom you are writing. If they are
novices in the subject, it will normally help if you are firm about
the structure of your work, so that they can learn the structure of
the knowledge itself. For example, if you feel that the subject
falls into three distinct areas, then that is an importnat thing to
teach.
If, however, your readers will already have some knowledge in the
subject, then they will already have formed their own structure for
it. In this case they will conciously or subconsiouly know where
they expect to find things. If your structure is different from
theirs, enforcing it too strongly will confuse them and put them
off.
You may in this case have resist a strong tendency to put across your
own structure strongly and to the detriment of all others. There are
T. Berners-Lee 34
WWW Server Guide) 14 July 1993
two solutions.
If you have a single well-defined audience in mind, who will share a
similar world view, then try to write excatly for that world view
rather than yours.
If you are simultaneously writing for more than one group, then you
must provide for both.
When you make a reference, qualify it with a clue to allow soime
people to skip it. For example, "If you really want to know how it
works inside, see the Internals guide", or "A step-by-step
introduction is in the tutorial".
Provide links for both reader's views. Your work will be more
connected than a simple tree, but with proper qualifiaction, noone
should get lost.
Provide two sepate tree "roots". For example, you can write a
step-by-step tutorial and a functionaly direct reference tree for
the same data. Both will at the lowest level have the same data, but
while the first will deal with the simple things first, the second
may be functionnaly grouped. This is just like having several
indexes to a book. The tutirial might also include information which
the reference work does not.
(Up to overview , back to Introduction , on to: writing each document
)
Tim BL
OVERLAPPING TREES
Here is an example of a work (describing some programming functions,
say) with two separate structures:
Tutorial Reference
| |
Let's do it togther ---------------
--
from simple to difficult |
|
| by Functional Alp
habetical
| group b
y name
Task oriented examples |
|
| ---------------
--
| |
Examples of use of Syntax definition f
or
specific functions <--------> specific function
s
The novice user starts at the top left, and works his way down. Where
he needs specific details, he will get down to the examples and from
T. Berners-Lee 35
WWW Server Guide) 14 July 1993
them a link to the underlying definitive desctiptions of each. As far
as he is concerned, he is reading a tree-strucured work. In fact,
he is reading the same information as the expert who, coming in to
check on one particular function, then looks up an example of its
use.
(Up to structure , back to user's structure , on to: document size )
Tim BL
HOW BIG TO MAKE EACH DOCUMENT
The most important point here is that a document should put across a
well-defined concept. It is not generally worth splitting one idea
arbitrarily into two bits in order to make the bits smaller. Nor is
it a good idea to put together ideas which area really separate just
to make a bigger document.
A document can be as small as a footnote .
There are two upper limits on a document's size. One is that long
documents will take longer to transfer, and so a reader will not be
able to simply jump to it and back as fast as he or she can think.
This depends a lot on the link speed of course.
The other limit is the difficulty for a reader to scroll through
large documents. Readers with character based terminals don't general
read more than a few screens. They often only absorb what is on the
first screen, as if that is not interesting they won't be bothered to
scroll down. Readers are also put off by being left at the top of a
large document.
Readers with graphic interfaces generally scroll through long
documents with a scroll bar. When the scroll bar is moved a small
amount, the document should move a sufficiently small amount so that
some of the original window-full is still left in the window. This
allows the reader to scan the document. If the document is any
bigger, then it is basically unreadable, in that any movement of the
scroll bar will loses the place and leaves the reader disoriented.
Advantages with longer documents are that it is easier for readers
with scrollbars to read through in an uninterrupted flow, if that is
how the document is written.
Also, one doesn't have to go to the trouble of making (or
generating) so many links and keeping them up to date if things are
altered. If making the links is a problem, just settle for one link
to a contents page. Some browsers have "next" and "previous" buttons
to allow a document to be browsed serially according to a list.
(In fact, one can normally scroll up and down explicitly page by
page, but this is gives the same feeling as the terminal interface.)
A rough guide, then, for the size of a document is:
For online help, menus giving access to other things: small
enough to fit on 24 lines. Check this by using a terminal
browser.
For textual documents, of the order of half a letter-sized (A4)
page to 5 pages.
T. Berners-Lee 36
WWW Server Guide) 14 July 1993
(Up to structure , back to overlapping trees , on to: within each
document )
Tim BL
Within each document
This section of the style guide deals with the layout of text within
a "document", the unit of retrieval of information on the web.
To be completed.
You should try to:
Sign your work
Give its status
Make links into context .
Use context-free document titles
Format device-independantly
Write for the printed work too
Write readable text despite the links
(up to overview , back to structure , on to testing )
Tim BL
SIGN IT!
An important aspect of information which helps keep it up to date is
that one can trace its author. Doing this with hypertext is easy --
all you have to do is put a link to a page about the author (or
simply to the author's phone book entry).
Make a page for yourself with your mail address and phone number. At
the bottom of files for which you are responsible, put a small note
-- say just your initials -- and link it to that page. The address
style (typically right justified) is useful for this.
Your author page is also a convenient place to put and disclaimers,
copyright noitices, etc which law or convention require. It saves
cluttering up the mesages themselves with a long signature.
If you are using the NeXT hypertext editor, then you can put this
link from your default blank page so that it turns up on the bottom
of each new document.
( up , back to ..., on to giving your document's status)
THE STATUS OF YOUR DOCUMENT
Some information is definitive, some is hastily put together and
incomplete. Both are useful to readers, so do not be shy to put
information up which is incomplete or out of date -- it may be the
best there is. However, do remember to state what the status is. When
was it last updated? Is it complete? What is its scope? For a phone
T. Berners-Lee 37
WWW Server Guide) 14 July 1993
book for example, what set of people are in it?
Not every document needs a status declaration, if there is something
in the overview page of the work which covers it.
You can of course also give a feel for the status of the text by its
language ... bad spelling, missing capitals, and relaxed grammer all
indicate informal notes. Careful use of verbs such as "shall" and
"should", and the introduction of Long Capitalised Noun Phrases
(LCNPs) will give at least the impression of an ISO standard. ;-)
Date it
In some cases it can be useful to put creation dates and last
modified dates on your work. (Note that this is the sort of thing
which one could make a server do automatically with a little
programming).
Figure out whether putting one might later save the reader from
following out of date information.
(back to Sign It, On to links into context )
LINKING TO CONTEXT
A major difference between writing part of a serial text, and an
online document, is that your readers may have jumped in from
anywhere. Even though you have only made links to it from one
place, any other person may want to refer to that particular point,
and will so make a link to that particular part of your work from
their own. So you can't rely on your reader having followed your
path through your work.
Of course if you are writing a tutorial, it will be important to keep
the flow from one document to the next in the order you intended for
its primary audience. You may not wish to cater specially for those
who jump in out of the blue, but it is wise to leave them with enough
clues so as not to be hopelessly lost. Some ways of doing this are:
Watch that your text and vocabulary stands by itself. Starting a
document with "The next thing we we consider is..." or "The only
solution to this problem is..." will certainly confuse.
Sometimes the opening words refer to the context, and can be
linked to background information. For example, in the WWW
project documentation, the first occurence of the acronym WWW is
often linked back to the central project document.
The navigation hints at the top or bottom of the document can
give explicit pointers. Examples are at the bottom of this
document.
It can also be useful to imagine as you are writing that you
yourself may wish to reuse the document. some day.
(Part of style guide for online hypertext . Up to Writing each
document , on to Title tag)
Tim BL
T. Berners-Lee 38
WWW Server Guide) 14 July 1993
DEVICE INDEPENDENCE
The hypertext you write is stored in HTML language, which does not
contain information about the fonts and paragraph shapes and spacing
which should be used for displaying the document.
This gives great advantages in that your document will be rendered
successfully on whatever platform it is viewed, including a plain
text terminal.
You should be aware that different clients do use different spacing
and fonts. You should be careful to use the structuring elements
such as headers and lists in the way in which they were intended. If
you don't like the rendering on your particular client, don't try to
fix it by using inappropriate elements, or trying for example to
force extra spacing with empty elements. This may well end up being
interpreted differently by other clients and looking very strange.
You can in many cases configure the client displays each element.
For example:
Always use heading levels in order, with one heading level 1 at
the top of the document, and if necessary several level 2
headings, and then if necessary several level 3 headings under
each level 2 heading. If you don't like the way heading level 2
is formatted, fix it on your client, don't just skip to heading
level 3.
Don't put extra spaces or blank lines into your text to pad it
out, except in preformatted (PRE) sections.
Don't refer in your text to facets of particular browesrs.
Asking someone to "click here" won't make sense without a mouse,
just as asking someone to "select a link by number" will betray
the fact that you were using the line mode browser. Just leave
a link. The instructions get boring as the user will normally
know how to select a link.
See also: testing your document .
Following these guidelines you may find that the end result does not
appear on your screen exactly as you would like, but your readers
will probably be happier.
(Part of the Style Guide for Online Hypertext . Up to within each
document , back to , on to printable hypertext)
Tim BL
PRINTABLE HYPERTEXT
In an ideal world, paper might not be necessary. In a next to ideal
world, one would have enough time to write a hypertext version of a
document and also a completely reauthor a paper version. In the real
world, you wilkl probably want to generate any printed documents and
online documents from the same file.
Suppose the HTML files will be the master, and you will generate the
printable from this, by translation into TeX, etc.
If you might one day want to do this, try to avoid references in the
text to online aspects. "See the section on device independence" is
T. Berners-Lee 39
WWW Server Guide) 14 July 1993
better than "For more on device independence, click here.". In fact
we are talking about a form of device independence.
Unfortunately the recommended practices of signing each document and
giving navigational links tend to mess up the printable copy, though
one can of course develop ways of stripping them out if they follow a
common format.
(Up to: within each document; back to device independece, on to
...)
Tim BL
Test your document
In a way your hypertext is like a book, which you should have
proof-read. In a way, it is like a program which you should have
tested. At least get someone from the group for which you wrote the
document to read it and give you some feedback. Other ideas are:
Read the document several different client programs, to ensure
that you have formatted it in a device independent way.
Monitor the readership of your document. You can do this by
analysing the server log files . You may find that some parts
are not being read, perhaps because people are looking in the
wrong place for them. You may see that people often follow a
path and backtrack. If you can guess what they were looking for,
you can make the clues around the link more helpful. (Remember
to keep log information confidential until you have removed user
information from it.)
Make it clear whether your will accept criticism or suggestions
from your readers, and how they should send it.
Ask people to solve problems using the document, and report on
their success. If they fail, find out what they were looking
for, whether it was in the document at all,
HOW MUCH TESTING?
Testing takes time. The decision of how much testing you do is
based on the quality of the document you wish to provide. You are
balancing your reader's time and effort against yours. If your
document is "selling" an idea, or if you are selling the document or
providing a service, you will want to make it as easy as possible
for the reader. If many people will read your work, a little of
your time will save a lot of theirs.
If however you are documenting some obscure part of a system in which
no one other than yourself is likely to be interested, or if you
feel that your readers are lucky to have anything available at all,
there is no point wasting time testing it. In the event of someone
needing the information, they might have to go to some extra trouble
to follow several links to find what they want, and then to
understand what you have written. This may be the most efficient way
T. Berners-Lee 40
WWW Server Guide) 14 July 1993
of working. I emphasize this because there is very much information
which is for a fleeting moment in people's minds, or is hastily
scribbled down on some file, and which may be important to posterity.
It is better for this information to be available even in unpolished
form than for it to be hidden out of embarrassment for its form.
Before electronic technology, the effort of publishing was such that
this information was never seen, and it was a waste, and and
considered an insult to one's readers, to publish something which was
not of high quality. Nowadays, there is "publishing" at all levels,
and both high quality and hasty documents have their value. It is
important, though, to make it clear what the quality of a document is
when making a reference to it, to avoid disappointment.
Monitoring the server log files will tell you which documents are
really being read. You can use your time most efficiently to improve
the quality of those. Of course, analysing the server log files also
takes time!
(Part of the Style Guide for Online Hypertext . Back to Within each
doument, On to Background reading)
Tim BL
Within each document
This section of the style guide deals with the layout of text within
a "document", the unit of retrieval of information on the web.
To be completed.
You should try to:
Sign your work
Give its status
Make links into context .
Use context-free document titles
Format device-independantly
Write for the printed work too
Write readable text despite the links
(up to overview , back to structure , on to testing )
Tim BL
Background reading
Some other documents which may be of relevance, if you are reading
the Style Guide for Online Hypertext :
The HTML Specification and references from it
T. Berners-Lee 41
WWW Server Guide) 14 July 1993
A Beginner's Guide to writing HTML
World-Wide Web server software - a list of pointers
Web Ettiquette -- for Server Administrators
(Back to testing, on to ...)
MAIL ROBOT
The mail robot is a program which will accept incoming mail and allow
remote users to:
Subscribe to mailing lists (and unsubscribe)
Retrieve information given a W3 addresss (URL)
Originally from UC Berkeley, an enhanced robot is distributed as part
of the world-wide web global information initiative . Futhur
information available is:
Help The help file for users of the robot service
Installation Installation instructions for unix system
managers
Bugs Lists of improvements requested or needed.
Change history A list of features introduced and bugs
fixed.
See also Other WWW software
Using the W3 mailing robot
This robot maintains the W3 mailing lists, and allows W3 documents to
be retrieved on request.
You can subscribe or unsubscribe to any of the various WWW mailing
lists by sending email to the robot "listserv@info.cern.ch" -- see
the commands listed below.
If you have any problems, requests or questions for a human being,
mail "www-request@info.cern.ch". Lists are:
www-announce Anyone interested in WWW, who would like
information about new releases or new online
data available. Please refrain from posting
administrivia to this large list !
www-talk Developers of WWW code, or those interested
in discussions of technical details
T. Berners-Lee 42
WWW Server Guide) 14 July 1993
You can also find information on WWW (as well as many other things!)
by telnetting to info.cern.ch (no username, no password).
If you want to pick up the WWW software, then use anonymous FTP to
info.cern.ch and look in directory /pub/www. Subdirectories are src
for the latest source packages, bin for executables for various
machines, doc for "paper copies" of articles on WWW in PostScript and
ASCII form. To read the latest documentation, use WWW !
COMMANDS
The commands understood by the listserv program are:
HELP lists this file. This is also sent whenever
a message to listserv is received from which
no valid command could be parsed.
HELP groupname lists a brief description of the group
requested.
ADD listname Add yourself to the list
DELETE listname take yourself off the list
ADD address listname Add yourself with a given mail address to
the given list. The address must not contain
spaces!
DELETE address listname
Remove the given name from the given list.
For all ADD/DELETE commands, mail is sent to
the address given to confirm the add or
delete operation.
SEND document-address returns a document with the requested W3
address.
STOP Stop processing requests: ignore the rest of
the message. Needed if you send a signature
on the end of your message (or if some
gateway adds one). If in doubt, use it.
A command must be the first word on each line in the message. Lines
which do not start with a command word are ignored. If no commands
were found in the entire message, this help file will be returned to
you. A single message may contain multiple commands; a separate
response will be sent for each.
Examples
add www-announce
add me@host.uni.edu www-announce
T. Berners-Lee 43
WWW Server Guide) 14 July 1993
delete me@host.uni.edu www-talk
send http://info.cern.ch/hypertext/DataSources/bySubject/Overv
iew.html
SUBSCRIPTION
If you are not sending mail from your preferred mail address, then
you can use the second form of the command to give your mail address.
If you are not on the internet, please convert your address into arpa
stye. (For example, UK users please use international ordering
joe@host.ac.uk) Just speficy the mailbox, without any spaces.
If you omit the 'address' the command will assume the mailbox that is
in the From: line of the message. Note that SUBSCRIBE is a synonym
for ADD; UNSUBSCRIBE for DELETE.
Please note that is IS possible to add or delete someone else's
subscription to a mailing list. This facility is provided so that
subscribers may alter their own subscriptions from a new or different
computer account. There is therefore some potential for abuse; we
have chosen to limit this by mailing a confirmation notification of
any addition or deletion to the address added or deleted including a
copy of the message which requested the operation. At least you can
find out who's doing it to you.
Note that although you would mail submissions to a mailing list by
addressing mail to e.g., www-talk@info.cern.ch, in a subscription
request you specify the name of the list simply (without the
@hostname part) as in the first example above.
RETRIEVING DOCUMENTS
The SEND command (or the WWW command which is equivalent) returns the
document with the given W3 address, subject to certain restrictions.
Hypertext documents are formatted to 72 character width, with links
numbered. A separate list at the end gives the document-addresses of
the related documents.
If the document is hypertext, it links will be marked by numbers in
brackets, and a list of document addresses by number will be appended
to the message. In this way, you can navigate through the web, albeit
only at mail speed.
If you don't know where to start, try asking for one of
http://info.cern.ch./hypertext/DataSources/bySubject/Overview.html
http://info.cern.ch./hypertext/DataSources/bySubject/Physics/HEP.html
http://info.cern.ch./hypertext/WWW/TheProject.html
for lists of futher pointers.
CAUTIONARY NOTE
As the robot gives potential mail access to a *vast* amount of
information, we must emphasise that the service should not be abused.
Examples of appropriate use would be:
Accessing any information about W3 itself;
T. Berners-Lee 44
WWW Server Guide) 14 July 1993
Accessing any CERN and/or physics-related or network development
related information;
Examples of INappropriate use would be:
Attempting to retrieve binaries or .tar files or anything more
than directory listsings or short ASCCII files from FTP archive
sites;
Reading internet newsgroups which your site doesn't take;
Repeated automatic use;
There is currently a 1000 line limit on any returned file. We don't
want to overload other people's mail relays or our server. We reserve
the right to withdraw the service at any time. We are currently
monitoring all use of the server, so your reading will not initially
enjoy privacy. End of cautionary note.
Enjoy!
The W3 team at CERN (www-bug@info.cern.ch)
Installation
Here are the steps necessary to install the Mail Robot product on
your unix system.
CUSTOMISATION
Set up the variables in listserv.h and CommonMakefile to suit your
site.
POSTMASTER The address from which messages appear to
come. Why not listserv? Perhaps to prevent
mail loops.
SECUREWWW The executable W3 line mode browser (v1.3 or
later, so as to have the -listrefs option).
This is a separate product. For security, www
should be writable only by root.
SERVERDIR The directory in which you want to put your
mailing lists and help about them.
COMPILE THE PROGRAMS
Everything compiled on AEM's MicroVax II running ULTRIX 3.0 then
TBL's NeXT without any problem at all. Your results may vary.
CREATE YOUR SERVDIR
wherever you specified in listserv.h. Install a HELP file, perhaps
using the example-files/HELP in this directory as a template.
SET UP AN ALIAS "LISTSERV"
Make an alias in your /etc/aliases (or /etc/sendmail/aliases,
T. Berners-Lee 45
WWW Server Guide) 14 July 1993
whatever you have) that points to this program, for example:
listserv: "|/usr/local/mail/listserv"
robot: "|/usr/local/mail/listserv"
FOR EACH MAILING LIST
Create a name.info file giving a bit of information about that
mailing list. see the *.info files in the example-files subdirectory.
Create a name file in the same directory, consisting of email
addresses one to a line of subscribers to a group. If it is for a
brand-new group, create an empty file. Remember that this file must
be writable by the mail daemon. The name of the file is just the name
of the group.
Depending on how you have your mailing lists set up, you may need to
add an alias to the /etc/aliases file for each of the mailing lists.
For example:
real-recipes: :include:/usr/local/mail/maillists/recipes
So sending mail to real-recipes actually goes to each of the
subscribers listed in /usr/local/mail/maillists/recipes
INSTALL LISTSERV
Install in the appropriate directory. Edit the CommonMakefile and
then
make install
RUN NEWALIASES
This gets sendmail to read the changes in /etc/aliases.
newaliases
TRY IT OUT
Send mail to listserv with body
HELP
for example. You should get a plain text version of the help file.
Mail Robot
This is a "listserv" type program which maintains mailing lists, and
allows W3 documents to be retrieved by electronic mail.
Author: Various, modified by TBL.
Status: Source available by anonymous FTP. (Oct 92)
T. Berners-Lee 46
WWW Server Guide) 14 July 1993
Current version: 1.0
Platforms: Unix only.
More information: Overview , Bugs , change history .
Bugs
This is a list of bugs in or improvements desired in the Mail Robot.
See also the list of bug fixes .
The INDEX command ought to be implemented, but for some reason
always returns an empty list. Occasionally it seems to work.
Change History
Changes to the Mail Robot , in reverse chronological order:
OCTOBER 1992
TBL added information retrieval possibility using WWW. Release as an
unsupported W3 product to those who ask for it.
1991
TBL rewrote str.c (used to overwrite its arguments).
AEM
A. E. Mossberg, aem@mthvax.cs.miami.edu made a couple minor changes,
to make it slightly less UCSD-specific. He also added a README, and
example files in the subdirectory example-files.
ORIGIN
Note this is NOT the bitnet LISTSERV program. The term "mail robot"
is yused to attempt to prevent confusion between these two products,
which have different functionality although they do basically the
same sort of thing.
This was the UCSD listserv program, which AEM retrieved from ucsd.edu
by anonymous ftp, TBL retrieved from ftp.eff.org As retrieved, from
file://ftp.eff.org/pub/listserv2.shar, it consisted of the following
files:
README
Makefile
commands.c
listserv.h
main.c
str.c
subscribe.c
T. Berners-Lee 47